import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Penguins dataset¶
penguins = sns.load_dataset("penguins")
penguins.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 344 entries, 0 to 343 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 species 344 non-null object 1 island 344 non-null object 2 bill_length_mm 342 non-null float64 3 bill_depth_mm 342 non-null float64 4 flipper_length_mm 342 non-null float64 5 body_mass_g 342 non-null float64 6 sex 333 non-null object dtypes: float64(4), object(3) memory usage: 18.9+ KB
sns.pairplot(data=penguins)
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
Diamonds¶
diamonds = sns.load_dataset("diamonds")
diamonds.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 53940 entries, 0 to 53939 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 carat 53940 non-null float64 1 cut 53940 non-null category 2 color 53940 non-null category 3 clarity 53940 non-null category 4 depth 53940 non-null float64 5 table 53940 non-null float64 6 price 53940 non-null int64 7 x 53940 non-null float64 8 y 53940 non-null float64 9 z 53940 non-null float64 dtypes: category(3), float64(6), int64(1) memory usage: 3.0 MB
sns.pairplot(data=diamonds)
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
Wine data¶
wine_names = ["Cultivar", "Alcohol", "Malic_acid", "Ash", "Alcalinity_of_ash", "Magnesium", "Total_phenols",
"Flavanoids", "Nonflavanoid_phenols", "Proanthocyanin", "Color_intensity", "Hue", "OD280_OD315", "Proline"]
wine_data = pd.read_csv("wine_data.txt", delimiter=",", names=wine_names)
wine_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 178 entries, 0 to 177 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Cultivar 178 non-null int64 1 Alcohol 178 non-null float64 2 Malic_acid 178 non-null float64 3 Ash 178 non-null float64 4 Alcalinity_of_ash 178 non-null float64 5 Magnesium 178 non-null int64 6 Total_phenols 178 non-null float64 7 Flavanoids 178 non-null float64 8 Nonflavanoid_phenols 178 non-null float64 9 Proanthocyanin 178 non-null float64 10 Color_intensity 178 non-null float64 11 Hue 178 non-null float64 12 OD280_OD315 178 non-null float64 13 Proline 178 non-null int64 dtypes: float64(11), int64(3) memory usage: 19.6 KB
wine_data
| Cultivar | Alcohol | Malic_acid | Ash | Alcalinity_of_ash | Magnesium | Total_phenols | Flavanoids | Nonflavanoid_phenols | Proanthocyanin | Color_intensity | Hue | OD280_OD315 | Proline | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 |
| 1 | 1 | 13.20 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050 |
| 2 | 1 | 13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185 |
| 3 | 1 | 14.37 | 1.95 | 2.50 | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480 |
| 4 | 1 | 13.24 | 2.59 | 2.87 | 21.0 | 118 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 173 | 3 | 13.71 | 5.65 | 2.45 | 20.5 | 95 | 1.68 | 0.61 | 0.52 | 1.06 | 7.70 | 0.64 | 1.74 | 740 |
| 174 | 3 | 13.40 | 3.91 | 2.48 | 23.0 | 102 | 1.80 | 0.75 | 0.43 | 1.41 | 7.30 | 0.70 | 1.56 | 750 |
| 175 | 3 | 13.27 | 4.28 | 2.26 | 20.0 | 120 | 1.59 | 0.69 | 0.43 | 1.35 | 10.20 | 0.59 | 1.56 | 835 |
| 176 | 3 | 13.17 | 2.59 | 2.37 | 20.0 | 120 | 1.65 | 0.68 | 0.53 | 1.46 | 9.30 | 0.60 | 1.62 | 840 |
| 177 | 3 | 14.13 | 4.10 | 2.74 | 24.5 | 96 | 2.05 | 0.76 | 0.56 | 1.35 | 9.20 | 0.61 | 1.60 | 560 |
178 rows × 14 columns
sns.pairplot(data=wine_data)
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
Sonar data¶
sonar_data = pd.read_csv("sonar_alldata.txt", sep=",", header=None)
sonar_data
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0200 | 0.0371 | 0.0428 | 0.0207 | 0.0954 | 0.0986 | 0.1539 | 0.1601 | 0.3109 | 0.2111 | ... | 0.0027 | 0.0065 | 0.0159 | 0.0072 | 0.0167 | 0.0180 | 0.0084 | 0.0090 | 0.0032 | R |
| 1 | 0.0453 | 0.0523 | 0.0843 | 0.0689 | 0.1183 | 0.2583 | 0.2156 | 0.3481 | 0.3337 | 0.2872 | ... | 0.0084 | 0.0089 | 0.0048 | 0.0094 | 0.0191 | 0.0140 | 0.0049 | 0.0052 | 0.0044 | R |
| 2 | 0.0262 | 0.0582 | 0.1099 | 0.1083 | 0.0974 | 0.2280 | 0.2431 | 0.3771 | 0.5598 | 0.6194 | ... | 0.0232 | 0.0166 | 0.0095 | 0.0180 | 0.0244 | 0.0316 | 0.0164 | 0.0095 | 0.0078 | R |
| 3 | 0.0100 | 0.0171 | 0.0623 | 0.0205 | 0.0205 | 0.0368 | 0.1098 | 0.1276 | 0.0598 | 0.1264 | ... | 0.0121 | 0.0036 | 0.0150 | 0.0085 | 0.0073 | 0.0050 | 0.0044 | 0.0040 | 0.0117 | R |
| 4 | 0.0762 | 0.0666 | 0.0481 | 0.0394 | 0.0590 | 0.0649 | 0.1209 | 0.2467 | 0.3564 | 0.4459 | ... | 0.0031 | 0.0054 | 0.0105 | 0.0110 | 0.0015 | 0.0072 | 0.0048 | 0.0107 | 0.0094 | R |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 203 | 0.0187 | 0.0346 | 0.0168 | 0.0177 | 0.0393 | 0.1630 | 0.2028 | 0.1694 | 0.2328 | 0.2684 | ... | 0.0116 | 0.0098 | 0.0199 | 0.0033 | 0.0101 | 0.0065 | 0.0115 | 0.0193 | 0.0157 | M |
| 204 | 0.0323 | 0.0101 | 0.0298 | 0.0564 | 0.0760 | 0.0958 | 0.0990 | 0.1018 | 0.1030 | 0.2154 | ... | 0.0061 | 0.0093 | 0.0135 | 0.0063 | 0.0063 | 0.0034 | 0.0032 | 0.0062 | 0.0067 | M |
| 205 | 0.0522 | 0.0437 | 0.0180 | 0.0292 | 0.0351 | 0.1171 | 0.1257 | 0.1178 | 0.1258 | 0.2529 | ... | 0.0160 | 0.0029 | 0.0051 | 0.0062 | 0.0089 | 0.0140 | 0.0138 | 0.0077 | 0.0031 | M |
| 206 | 0.0303 | 0.0353 | 0.0490 | 0.0608 | 0.0167 | 0.1354 | 0.1465 | 0.1123 | 0.1945 | 0.2354 | ... | 0.0086 | 0.0046 | 0.0126 | 0.0036 | 0.0035 | 0.0034 | 0.0079 | 0.0036 | 0.0048 | M |
| 207 | 0.0260 | 0.0363 | 0.0136 | 0.0272 | 0.0214 | 0.0338 | 0.0655 | 0.1400 | 0.1843 | 0.2354 | ... | 0.0146 | 0.0129 | 0.0047 | 0.0039 | 0.0061 | 0.0040 | 0.0036 | 0.0061 | 0.0115 | M |
208 rows × 61 columns
sonar_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 208 entries, 0 to 207 Data columns (total 61 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 0 208 non-null float64 1 1 208 non-null float64 2 2 208 non-null float64 3 3 208 non-null float64 4 4 208 non-null float64 5 5 208 non-null float64 6 6 208 non-null float64 7 7 208 non-null float64 8 8 208 non-null float64 9 9 208 non-null float64 10 10 208 non-null float64 11 11 208 non-null float64 12 12 208 non-null float64 13 13 208 non-null float64 14 14 208 non-null float64 15 15 208 non-null float64 16 16 208 non-null float64 17 17 208 non-null float64 18 18 208 non-null float64 19 19 208 non-null float64 20 20 208 non-null float64 21 21 208 non-null float64 22 22 208 non-null float64 23 23 208 non-null float64 24 24 208 non-null float64 25 25 208 non-null float64 26 26 208 non-null float64 27 27 208 non-null float64 28 28 208 non-null float64 29 29 208 non-null float64 30 30 208 non-null float64 31 31 208 non-null float64 32 32 208 non-null float64 33 33 208 non-null float64 34 34 208 non-null float64 35 35 208 non-null float64 36 36 208 non-null float64 37 37 208 non-null float64 38 38 208 non-null float64 39 39 208 non-null float64 40 40 208 non-null float64 41 41 208 non-null float64 42 42 208 non-null float64 43 43 208 non-null float64 44 44 208 non-null float64 45 45 208 non-null float64 46 46 208 non-null float64 47 47 208 non-null float64 48 48 208 non-null float64 49 49 208 non-null float64 50 50 208 non-null float64 51 51 208 non-null float64 52 52 208 non-null float64 53 53 208 non-null float64 54 54 208 non-null float64 55 55 208 non-null float64 56 56 208 non-null float64 57 57 208 non-null float64 58 58 208 non-null float64 59 59 208 non-null float64 60 60 208 non-null object dtypes: float64(60), object(1) memory usage: 99.2+ KB
sonar_data.shape
(208, 61)
diamonds_numeric_names = diamonds.select_dtypes("number").columns.tolist()
diamonds_numeric_names
['carat', 'depth', 'table', 'price', 'x', 'y', 'z']
diamonds_category_names = diamonds.select_dtypes("category").columns.tolist()
diamonds_category_names
['cut', 'color', 'clarity']
Reshape from WIDE to LONG!
diamonds_lf = diamonds.reset_index().\
rename(columns={"index": "rowid"}).\
melt(id_vars=["rowid"]+diamonds_category_names,
value_vars=diamonds_numeric_names)
diamonds_lf.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 377580 entries, 0 to 377579 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 rowid 377580 non-null int64 1 cut 377580 non-null category 2 color 377580 non-null category 3 clarity 377580 non-null category 4 variable 377580 non-null object 5 value 377580 non-null float64 dtypes: category(3), float64(1), int64(1), object(1) memory usage: 9.7+ MB
We can now associate the variable column within the LONG format data with the column argument to create the COLUMN FACETS!!
diamonds_lf.variable.value_counts()
variable carat 53940 depth 53940 table 53940 price 53940 x 53940 y 53940 z 53940 Name: count, dtype: int64
sns.displot(data=diamonds_lf, x="value", col="variable", col_wrap=3,
kind="hist",
facet_kws={"sharex": False, "sharey": False},
common_bins=False)
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
We can also study the CONDITIONAL DISTRIBUTIONS of the num cols GIVEN or GROUPED BY a cat variable!!!
sns.catplot(data=diamonds_lf, x="color", y="value", col="variable", col_wrap=3,
kind="box", sharey=False, hue="color", palette="coolwarm")
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
We can study the CONDITIONAL MEANS or the AVG per GROUP!!!
sns.catplot(data=diamonds_lf, x="color", y="value", col="variable", col_wrap=3,
kind="point", sharey=False, hue="color", palette="coolwarm")
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
You always need to examine the SHAPE of the distribution!!
sns.catplot(data=diamonds_lf, x="color", y="value", col="variable", col_wrap=3,
kind="violin", sharey=False, hue="color", palette="coolwarm")
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
Sonar¶
sonar_data.shape
(208, 61)
sonar_data.dtypes.value_counts()
float64 60 object 1 Name: count, dtype: int64
sonar_data.dtypes
0 float64
1 float64
2 float64
3 float64
4 float64
...
56 float64
57 float64
58 float64
59 float64
60 object
Length: 61, dtype: object
sonar_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 208 entries, 0 to 207 Data columns (total 61 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 0 208 non-null float64 1 1 208 non-null float64 2 2 208 non-null float64 3 3 208 non-null float64 4 4 208 non-null float64 5 5 208 non-null float64 6 6 208 non-null float64 7 7 208 non-null float64 8 8 208 non-null float64 9 9 208 non-null float64 10 10 208 non-null float64 11 11 208 non-null float64 12 12 208 non-null float64 13 13 208 non-null float64 14 14 208 non-null float64 15 15 208 non-null float64 16 16 208 non-null float64 17 17 208 non-null float64 18 18 208 non-null float64 19 19 208 non-null float64 20 20 208 non-null float64 21 21 208 non-null float64 22 22 208 non-null float64 23 23 208 non-null float64 24 24 208 non-null float64 25 25 208 non-null float64 26 26 208 non-null float64 27 27 208 non-null float64 28 28 208 non-null float64 29 29 208 non-null float64 30 30 208 non-null float64 31 31 208 non-null float64 32 32 208 non-null float64 33 33 208 non-null float64 34 34 208 non-null float64 35 35 208 non-null float64 36 36 208 non-null float64 37 37 208 non-null float64 38 38 208 non-null float64 39 39 208 non-null float64 40 40 208 non-null float64 41 41 208 non-null float64 42 42 208 non-null float64 43 43 208 non-null float64 44 44 208 non-null float64 45 45 208 non-null float64 46 46 208 non-null float64 47 47 208 non-null float64 48 48 208 non-null float64 49 49 208 non-null float64 50 50 208 non-null float64 51 51 208 non-null float64 52 52 208 non-null float64 53 53 208 non-null float64 54 54 208 non-null float64 55 55 208 non-null float64 56 56 208 non-null float64 57 57 208 non-null float64 58 58 208 non-null float64 59 59 208 non-null float64 60 60 208 non-null object dtypes: float64(60), object(1) memory usage: 99.2+ KB
I do not like to use NUMBERS as col names!!!
Lets change them using LIST COMPREHENSION!!
Lets change the col names to the pattern X00, X01, X02...
"X%02d" % 0
'X00'
"X%02d" % 10
'X10'
["X%02d" % d for d in sonar_data.columns]
['X00', 'X01', 'X02', 'X03', 'X04', 'X05', 'X06', 'X07', 'X08', 'X09', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20', 'X21', 'X22', 'X23', 'X24', 'X25', 'X26', 'X27', 'X28', 'X29', 'X30', 'X31', 'X32', 'X33', 'X34', 'X35', 'X36', 'X37', 'X38', 'X39', 'X40', 'X41', 'X42', 'X43', 'X44', 'X45', 'X46', 'X47', 'X48', 'X49', 'X50', 'X51', 'X52', 'X53', 'X54', 'X55', 'X56', 'X57', 'X58', 'X59', 'X60']
sonar_data.columns = ["X%02d" % d for d in sonar_data.columns]
sonar_data.columns
Index(['X00', 'X01', 'X02', 'X03', 'X04', 'X05', 'X06', 'X07', 'X08', 'X09',
'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19',
'X20', 'X21', 'X22', 'X23', 'X24', 'X25', 'X26', 'X27', 'X28', 'X29',
'X30', 'X31', 'X32', 'X33', 'X34', 'X35', 'X36', 'X37', 'X38', 'X39',
'X40', 'X41', 'X42', 'X43', 'X44', 'X45', 'X46', 'X47', 'X48', 'X49',
'X50', 'X51', 'X52', 'X53', 'X54', 'X55', 'X56', 'X57', 'X58', 'X59',
'X60'],
dtype='object')
sonar_data["X00"]
0 0.0200
1 0.0453
2 0.0262
3 0.0100
4 0.0762
...
203 0.0187
204 0.0323
205 0.0522
206 0.0303
207 0.0260
Name: X00, Length: 208, dtype: float64
We need to extract the numeric col name.
sonar_numeric_names = sonar_data.select_dtypes("number").columns.to_list()
sonar_numeric_names
['X00', 'X01', 'X02', 'X03', 'X04', 'X05', 'X06', 'X07', 'X08', 'X09', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20', 'X21', 'X22', 'X23', 'X24', 'X25', 'X26', 'X27', 'X28', 'X29', 'X30', 'X31', 'X32', 'X33', 'X34', 'X35', 'X36', 'X37', 'X38', 'X39', 'X40', 'X41', 'X42', 'X43', 'X44', 'X45', 'X46', 'X47', 'X48', 'X49', 'X50', 'X51', 'X52', 'X53', 'X54', 'X55', 'X56', 'X57', 'X58', 'X59']
sonar_category_names = sonar_data.select_dtypes("object").columns.to_list()
sonar_category_names
['X60']
RESHAPE from WIDE to LONG format!!
sonar_lf = sonar_data.reset_index().\
rename(columns={"index":"rowid"}).\
melt(id_vars=["rowid"]+sonar_category_names, value_vars=sonar_numeric_names)
sonar_lf
| rowid | X60 | variable | value | |
|---|---|---|---|---|
| 0 | 0 | R | X00 | 0.0200 |
| 1 | 1 | R | X00 | 0.0453 |
| 2 | 2 | R | X00 | 0.0262 |
| 3 | 3 | R | X00 | 0.0100 |
| 4 | 4 | R | X00 | 0.0762 |
| ... | ... | ... | ... | ... |
| 12475 | 203 | M | X59 | 0.0157 |
| 12476 | 204 | M | X59 | 0.0067 |
| 12477 | 205 | M | X59 | 0.0031 |
| 12478 | 206 | M | X59 | 0.0048 |
| 12479 | 207 | M | X59 | 0.0115 |
12480 rows × 4 columns
sonar_lf.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 12480 entries, 0 to 12479 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 rowid 12480 non-null int64 1 X60 12480 non-null object 2 variable 12480 non-null object 3 value 12480 non-null float64 dtypes: float64(1), int64(1), object(2) memory usage: 390.1+ KB
sonar_lf.variable.value_counts()
variable X00 208 X01 208 X32 208 X33 208 X34 208 X35 208 X36 208 X37 208 X38 208 X39 208 X40 208 X41 208 X42 208 X43 208 X44 208 X45 208 X46 208 X47 208 X48 208 X49 208 X50 208 X51 208 X52 208 X53 208 X54 208 X55 208 X56 208 X57 208 X58 208 X31 208 X30 208 X29 208 X14 208 X02 208 X03 208 X04 208 X05 208 X06 208 X07 208 X08 208 X09 208 X10 208 X11 208 X12 208 X13 208 X15 208 X28 208 X16 208 X17 208 X18 208 X19 208 X20 208 X21 208 X22 208 X23 208 X24 208 X25 208 X26 208 X27 208 X59 208 Name: count, dtype: int64
We can now use Seaborn to associate FACETS for each uniqye valye of variable to examine the original wide format numeric columns!!
sns.displot(data=sonar_lf, x="value", col="variable", col_wrap=5,
facet_kws={"sharex": False, "sharey": False}, common_bins=False,
kind="hist")
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
We can also create the CONDITIONAL KD!!! Where we COLOR by the OBJECT COLUMN!!
sns.displot(data=sonar_lf, x="value", col="variable", hue="X60", col_wrap=5,
facet_kws={"sharex": False, "sharey": False}, common_norm=False,
kind="kde")
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
We can also use VIOLINS to compare the CONDITIONAL distribution SHAPES.
sns.catplot(data=sonar_lf, y="value", col="variable", x="X60", col_wrap=5,
sharey=False, hue="X60",
kind="violin")
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
sns.catplot(data=sonar_lf, y="value", col="variable", x="X60", col_wrap=5,
sharey=False, hue="X60", join=False,
kind="point")
plt.show()
/var/folders/hn/_r1c754d1kj1fxryljd8w6g80000gn/T/ipykernel_88181/3688863808.py:1: UserWarning: The `join` parameter is deprecated and will be removed in v0.15.0. You can remove the line between points with `linestyle='none'`. sns.catplot(data=sonar_lf, y="value", col="variable", x="X60", col_wrap=5, /Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
BUT there is something unique about this dataset...to see it...lets use the WIDE FORMAT Seaborn plotting...
sns.catplot(data=sonar_data, kind="box", aspect=3.5)
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
sns.catplot(data=sonar_data, kind="point", aspect=3.5, join=False)
plt.show()
/var/folders/hn/_r1c754d1kj1fxryljd8w6g80000gn/T/ipykernel_88181/15802379.py:1: UserWarning: The `join` parameter is deprecated and will be removed in v0.15.0. You can remove the line between points with `linestyle='none'`. sns.catplot(data=sonar_data, kind="point", aspect=3.5, join=False) /Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
The WIDE format does NOT let us GROUP BY categoricals.
The LONG format lets us GROUP BY categorical variables and associate FIGURE elemnts with the numerics!!
sns.catplot(data=sonar_lf, x="variable", y="value", kind="box", aspect=3.5, hue="variable")
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
GROUP BY X60.
sns.catplot(data=sonar_lf, x="variable", y="value", kind="box", aspect=3.5, hue="X60")
plt.show()
/Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
sns.catplot(data=sonar_lf, x="variable", y="value", kind="point", aspect=3.5, hue="X60", join=False)
plt.show()
/var/folders/hn/_r1c754d1kj1fxryljd8w6g80000gn/T/ipykernel_88181/2773777666.py:1: UserWarning: The `join` parameter is deprecated and will be removed in v0.15.0. You can remove the line between points with `linestyle='none'`. sns.catplot(data=sonar_lf, x="variable", y="value", kind="point", aspect=3.5, hue="X60", join=False) /Applications/anaconda3/envs/cmpinf2100/lib/python3.8/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
Summary¶
If we have less than 10 numeric variables, we can use point plots.
If we have more than 10 numeric variables, its hard to use point plots and that is motivating us to use Cluster Analysis and PCA.
We have to RESHAPE the data in order to explore the numeric data through FACETS.